-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: expand and optimize regression testing #1918
Conversation
I played around with this a bit and here are some observations. I'm giving relative numbers of errors because I'm running these tests on Windows, and there are a small number of additional failures on my setup that are due to very minor font differences. Not sure if this is a general Windows issue or specific to my configuration (which is pretty basic), but since there are a small number (around 30) it doesn't affect things much at the moment.
I think this is well worth pursuing and I'm happy to put some time into it. |
Hey! Thanks for chiming in!
I've noticed this as well. Later, I would like for us to update the tests to use the default config. I didn't do this yet because one of the W3C tests would also fail, and I haven't set aside time to look at it yet.
Never tried this, so thanks for putting your computer through that and sharing the result. Do you remember how long that took to run compared to non-multipass?
Since implementing this, I've only resolved one issue so far in #1916. After that, I'd only reviewed the ones that I knew #1892 "broke". That's the increase of 12 you noticed. I reviewed each one, and deemed them a false positive. We need to make the test runner less sensitive, imo. When testing SVGs like this, we must leave some leeway due to antialiasing issues, and could have some leeway for unperceivable differences. I'm not sure if this one came from antialiasing or if a few pixels actually changed so we can investigate it that further. I allowed it because zoomed in, full screened, on a 3440x1440 monitor, I couldn't make it out even a subtle difference. Something interesting as well in those 12 images, the tests pass if we remove the For clarity, the goal is to have zero observable difference. Sometimes a few pixels change due to antialiasing, which can't be helped. I do want us to enforce against subtle differences too, but only if a human can see it after being told where to look and flipping between the two versions.
That's great, thanks for reviewing the results too. If you need any help or find you're too busy, don't be afraid to let us know here and someone else can take a look too. |
Very roughly twice as long, but since I'm usually doing other work on the computer while these run, the times vary quite a bit, so this is just an estimate.
Possibly, but I suspect at least some of these mismatches are due to real problems that happen to manifest in a subtle way in this instance. The W3C tests seem to have very few if any false positives. Probably this will eventually come down to a philosophical question of "what is an observable difference?". Can you give me an example of one that you deemed a false positive? That would help me understand your position on the philosophical question :) So I agree that we may eventually want to make this less sensitive, once we get to the point where the false positives dominate the mismatches, but that doesn't seem to be a problem at the moment. |
I didn't mean that all the current issues were false positives, in fact, I suspect that most are real bugs. I just meant we certainly have some false positives, for example the
Provided above! Feel free to try to optimize it locally and let me know if you'd disagree. |
I'm not convinced yet that the im-google mismatches are false positives (though they may be in the end). The tests pass if you run them with I haven't tried to figure out why there's a difference. Maybe it will never be a "noticeable" difference, and it certainly doesn't seem like the biggest problem at the moment, but personally I don't understand the underlying cause enough to feel comfortable that this is not a symptom of a potentially bigger problem. |
I noted that
That's why I linked the PR that I did, which is the one that added that option. To reiterate, all optimizations can make very minor differences that a machine can detect. This is namely regarding antialiasing, other cases include subpixel rounding, which we do a lot of throughout SVGO. But as a tool, we work for humans, not machines. As such, the final call should be made by human eyes. Have you tried opening The reason I'm saying we're too strict is because we check for a fixed number of pixels changes on top of the tolerance, which is a flawed way to check for antialiasing issues or other issues. If a single pixel changes, that is the expectation, not a bug. For clarity, we already have tolerances set in our test runner, and other SVG optimizers do the same:
The problem is that when 0.1 wasn't enough, it was increased by checking the number of pixels instead of increasing the We have other mismatches (one of the SVGs like In other words, our testing methodology is flawed because we allow 4 mismatching pixels per image, but if we just had one document that was the same image next to each other, it would suddenly fail. That isn't what we're going for. |
OK. What do you think about merging this change along with a change in regression.js:
I think these new tests are really useful, and having them in the main branch would help minimize errors. Hopefully we can reduce MAX_MISMATCHES over time. |
I'm fine with the idea, but not that implementation. Mostly because it'll be unclear which images were broken if someone needs to review it after. If we were to merge this eagerly, I'd prefer to add the list to our skip array instead while we work on fixing them: I hoped to only merge this once we've improved performance, but the repo is getting so active now that it might be best to merge it sooner. ^-^' We can probably hold off until we've fixed the broken tests, though. I'll be jumping between reviewing pull requests and resolving these tests, but hopefully soon we'll be in a position to merge it. |
The exclude list is more work, but probably worth it. From what I've found in the small number of problems/plugins I've looked at in detail, there's a ton of cleanup that should be done before we think about adding new features, so I think it would be good to get this in the main branch. One thing that I think would be helpful both when fixing issues and working on new code is some sort of guidance on best practices for these plugins. The Plugin Architecture doc gives a start on writing a plugin, but doesn't give any guidance on dos and don'ts. I'm seeing a lot of disparity in how different plugins address (or don't address) similar problems, and much of this code looks inefficient or incomplete. So maybe as we encounter these inconsistencies, we can document the "best" way to handle them and hopefully end up with a more consistent architecture. |
Entirely agree, I've tried improving the situation by centralizing some logic into utilities, but hardly made a dent. The Plugin Architecture doc is meant for custom/external plugins, but some guidance can definitely live there. Things that reference internal SVGO code would probably belong in our contribution guide instead. |
78097d9
to
2112e7d
Compare
With the current main branch code, along with PR #1956 and #1961, I can get down to no pixel mismatches by disabling 7 plugins. I ran this with the default configuration (floatPrecision undefined) using all the regression files without duplicates; I ran it against the original files and a second set of files which consisted of the original files pre-processed with the removeDimensions plugin - this stripped out whitespace and also replaced width/height with a viewBox. Many of the Oxygen icons have a width/height of 128x128, so they take up just a small corner of the screen. As far as the regressions are concerned, this has the effect of minimizing differences, so running against the preprocessed files tends to amplify any real visual differences. The preprocessed files take about 50% longer to run, presumably because there is more work to do in the rendering and/or screenshotting. The statistics below are based on the preprocessed files except where otherwise noted. At this point I'm not aware of any bugs (other than the 2 above) which impact the regression files. Most of the problems are due to rounding. I'm sure there are places where we could do rounding better, but there's no way to have a completely safe configuration which includes rounding, it's inherently lossy, and extremely difficult to assess how it's going to affect rendering. Rounding should be opt-in. Most of the problematic plugins have rounding on by default, and often no way to disable it. It seems like there are probably a few things that are false positives - these all seem to involve rewriting paths. I've looked at a few that are simple enough that they seem to clearly be correct, but have slight visual differences. Usually they show up in files that are pretty complex, with lots of transforms and overlapping bits, and the mismatches are really sensitive to screen size, colors, and other seemingly unrelated parts of the file. More details on the plugins involved: convertShapeToPathEnabling this plugin adds 1,115,287 bytes of compression. There are 15,517 mismatches in 30 files. I am not able to find any that are obviously bugs. All of the differences are very small, generally too small to see (even for files which have a large number of pixel differences). My guess is the the SVG engine processes mergePathsI think there might be some false positives here, but there are two issues that need to be addressed:
In its current state (with rounding on by default), mergePaths adds 20,424,490 bytes of compression, and introduces 3,424 pixel mismatches in 169 files. With rounding off, there is 13,918,409 bytes of additional compression, and 2,155 pixel mismatches in 39 files. convertPathDataEnabling this plugin adds a lot of compression - 123,856,196 bytes. It also adds 152,275 pixel mismatches in 1,725 files. There's probably some amount of compression we could get safely by minifying paths, but in its current state this plugin should not be on by default. convertTransformEnabling this plugin adds 4,072,885 bytes of compression, and introduces 25,598 pixel mismatches in 889 files. There is no way to disable rounding in the current plugin. This should be off by default. There are many lossless optimizations that can be done, but they would require rewriting/refactoring the plugin. cleanupNumericValuesEnabling this plugin adds 4,040,396 bytes of compression, and introduces 2,283 pixel mismatches in 66 files. Its main purpose is rounding. It should be off by default. moveGroupAttrsToElemsEnabling this plugin reduces compression by 5,783,481 bytes (it actually makes files larger). It introduces 3,403 pixel mismatches in 14 files. I'm not sure what the point of this plugin is, but it seems like it should be off by default. removeViewBoxThis plugin causes 2 small failures in the unprocessed files which I think are both false positives (they have a non-integral width/height). There are no mismatches in the preprocessed files (where width/height has been removed). But since almost everyone thinks this should be disabled by default, this seems like a great opportunity to do so. SummaryGiven the discussions in PR #1461 and issue #1360, it seems like we should:
This would make SVGO much easier for people to use, and give us a much more comprehensive testing environment as we fix bugs and re-implement some of the features in the disabled plugins. |
Would you mind adding reasoning? The change was basically a matter of semantics, so I don't see the problem.
I strongly disagree. Semi-lossy compression is a large part of SVGO's file size reductions, and disabling these plugins would make a bunch of people be like "is SVGO not working? my files are too big after the update". If you want it to be less lossy, we have a parameter for that: floatPrecision. (But yes, even with a high floatPrecision there are some mismatches that should be addressed.) |
That seems to be a minority opinion, and is contrary to the stated goals of the project. The first paragraph of the SVGO README says "SVG files ... usually contain a lot of redundant information ... that can be safely removed or converted without impacting rendering." Also see the discussions in PR #1461 and issue #1360. I'm sure there are use cases where the value of increased compression outweighs the risk of changes to rendering, but the default should adhere to the stated goals of the project - things that can change rendering should be opt-in, not opt-out.
The semantics changed from "This plugin is not expected to change rendering and does not do any rounding. If you want the plugin to do rounding, add a --precision on the command line." to "This plugin rounds to 3 decimal places by default. In some cases this will change rendering. If you do not want the plugin to do rounding, write a custom configuration JavaScript file and include a --config on the command line". This is contrary to the stated goals of the project. Default behavior should be to not change rendering. |
I've been thinking about this since I've joined as a maintainer. But rounding is seen as a core part of SVGO, so disabling rounding is off the table for now. Finding ways to make it safer, or making it less sensitive can be considered, however.
I've seen these as well, I'd like to merge a few more pull requests and review more of the mismatches first, but we'll soon change how we handle the threshold of allowable differences.
I ran the regression tests before and after with default settings, and the PR looked fine. 🤔 Just checking, have you identified any actual issues here outside the stats?
I'll look into this.
Already slated for v4, but not sure when we'll release that. I'm keen to stay in v3 for a while, at least until we get the new regression tests merged, but probably longer.
I strongly disagree with this. Users are already used to the quirks of SVGO, so instead of rushing to resolve mismatches, we should take our time with fixing the issues that cause rendering changes. There is no rush to get this merged.
Partially agree with this. I hate to think of SVGO is lossy in any capacity, but the reality is that it is. Though, we try to be lossy in ways that are not visually perceivable at all. For example, rounding down redundant decimal places. SVGs will always be affecting to some degree by the changes we make. If we wanted zero difference across all SVGs, SVGO would never work, since antialiasing would screw over most transformations. However, the goal is to be "effectively" lossless. However, users are indeed already use to the optimizations we have, so we shouldn't be undoing any of them unless we can prove the optimization itself is flawed.
This is true, but we judge this based on if a human will identify the difference, not if a program will. And again, there is no rush to merge this. Let's take our time and stamp out the bugs. TL;DR: I know you're eager to see this merged, and I am too. But let's not rush it. We don't want to make any big changes to SVGO that will confuse users, just to get our improved QA merged faster. It's worth noting that I always run the regression tests on |
I did not see anything in the regression files that was an obvious visual difference. |
Maybe when the testing finishes running it could report the top 5 files in terms of mismatches |
Actions are 100% free in public repos. No need to worry here. |
Thanks for correctly me, I've just looked into it again and turns out I misunderstood the pricing/billing page. I thought it was 2,000/3,000 free including public repos, but you're right, it's actually 2,000 (or 3,000 in our case) free for private repos, and unlimited for public repos. Edit: I'd still like to run regression tests conditionally, just to avoid wasting computation/energy if lint/tests fail anyway. But good to know there isn't a quota to worry about. |
!exclude.includes(name) && | ||
!name.startsWith('svgs/W3C_SVG_11_TestSuite/svg/animate-') | ||
) { | ||
const file = path.join(baseDir, header.name); |
Check failure
Code scanning / CodeQL
Arbitrary file access during archive extraction ("Zip Slip") High test
file system operation
I've decided to merge this already, and will just include the currently failing tests in the excluded array. Over time, we'll investigate these failing tests and resolve bugs or adjust the regression checks as necessary. For now, we need to ensure PRs are thoroughly tested, which using the new test suite will help a lot. Fortunately, most of the remaining failing tests are fairly minor anyway, but I'm still keen to empty out our excluded array soon, so will keep an eye on these. Thanks to everyone that helped with resolving the broken tests. |
SVGO Test Suite
Expands the regression test task to include KDE's Oxygen Icons.
SVGO is in dire need of better quality assurance, and Oxygen Icons is a good library to use as it has relatively complex SVGs compared to other icon libraries.
This PR is to track progress. We need to fix cases that are broken in Oxygen Icons, and optimize SVGO, tests, and the pipeline so that we can complete the regression tests quicker. Since we started planning this, we've already taken the time down from 23.4 minutes.
Any help to fix tests reported in this PR or improve performance would be very welcome. Performance is particularly valuable right now, as it makes iterating on fixes faster. (Success criteria for performance improvements is that the number of mismatches didn't increase.)
Regression Action
This also updates the regression action to avoid running it when possible.
In the state it's in right now, it's 18 minutes per build. (Per PR at minimum, but most PRs usually pushed to multiple times, so are multiple builds.) It'd be a lot of wasted computation/energy to run it redundantly.
I've made the following changes:
lint
ortest
job failed anyway.push
tomain
. We should no longer by pushing directly tomain
, so there's no need for this anymore. We only merge from pull requests.Broken Tests
See the GitLab repository for Oxygen Icons to download the files you're interested in fixing. Also note that files that are reported as broken could be caused by a browser bug or a quirk in our testing methodology.
List of files broken after optimization:
Related
Future
It may be worth collecting more meaningful statistics from our regression tests, as proposed in Add statistics to regression.js.